27 research outputs found
What Makes a Good Plan? An Efficient Planning Approach to Control Diffusion Processes in Networks
In this paper, we analyze the quality of a large class of simple dynamic
resource allocation (DRA) strategies which we name priority planning. Their aim
is to control an undesired diffusion process by distributing resources to the
contagious nodes of the network according to a predefined priority-order. In
our analysis, we reduce the DRA problem to the linear arrangement of the nodes
of the network. Under this perspective, we shed light on the role of a
fundamental characteristic of this arrangement, the maximum cutwidth, for
assessing the quality of any priority planning strategy. Our theoretical
analysis validates the role of the maximum cutwidth by deriving bounds for the
extinction time of the diffusion process. Finally, using the results of our
analysis, we propose a novel and efficient DRA strategy, called Maximum
Cutwidth Minimization, that outperforms other competing strategies in our
simulations.Comment: 18 pages, 3 figure
Multivariate Hawkes Processes for Large-scale Inference
In this paper, we present a framework for fitting multivariate Hawkes
processes for large-scale problems both in the number of events in the observed
history and the number of event types (i.e. dimensions). The proposed
Low-Rank Hawkes Process (LRHP) framework introduces a low-rank approximation of
the kernel matrix that allows to perform the nonparametric learning of the
triggering kernels using at most operations, where is the
rank of the approximation (). This comes as a major improvement to
the existing state-of-the-art inference algorithms that are in .
Furthermore, the low-rank approximation allows LRHP to learn representative
patterns of interaction between event types, which may be valuable for the
analysis of such complex processes in real world datasets. The efficiency and
scalability of our approach is illustrated with numerical experiments on
simulated as well as real datasets.Comment: 16 pages, 5 figure
A framework for paired-sample hypothesis testing for high-dimensional data
The standard paired-sample testing approach in the multidimensional setting
applies multiple univariate tests on the individual features, followed by
p-value adjustments. Such an approach suffers when the data carry numerous
features. A number of studies have shown that classification accuracy can be
seen as a proxy for two-sample testing. However, neither theoretical
foundations nor practical recipes have been proposed so far on how this
strategy could be extended to multidimensional paired-sample testing. In this
work, we put forward the idea that scoring functions can be produced by the
decision rules defined by the perpendicular bisecting hyperplanes of the line
segments connecting each pair of instances. Then, the optimal scoring function
can be obtained by the pseudomedian of those rules, which we estimate by
extending naturally the Hodges-Lehmann estimator. We accordingly propose a
framework of a two-step testing procedure. First, we estimate the bisecting
hyperplanes for each pair of instances and an aggregated rule derived through
the Hodges-Lehmann estimator. The paired samples are scored by this aggregated
rule to produce a unidimensional representation. Second, we perform a Wilcoxon
signed-rank test on the obtained representation. Our experiments indicate that
our approach has substantial performance gains in testing accuracy compared to
the traditional multivariate and multiple testing, while at the same time
estimates each feature's contribution to the final result.Comment: 35th IEEE International Conference on Tools with Artificial
Intelligence (ICTAI). 6 pages, 3 figure
To tree or not to tree? Assessing the impact of smoothing the decision boundaries
When analyzing a dataset, it can be useful to assess how smooth the decision
boundaries need to be for a model to better fit the data. This paper addresses
this question by proposing the quantification of how much should the 'rigid'
decision boundaries, produced by an algorithm that naturally finds such
solutions, be relaxed to obtain a performance improvement. The approach we
propose starts with the rigid decision boundaries of a seed Decision Tree (seed
DT), which is used to initialize a Neural DT (NDT). The initial boundaries are
challenged by relaxing them progressively through training the NDT. During this
process, we measure the NDT's performance and decision agreement to its seed
DT. We show how these two measures can help the user in figuring out how
expressive his model should be, before exploring it further via model
selection. The validity of our approach is demonstrated with experiments on
simulated and benchmark datasets.Comment: 12 pages, 3 figures, 3 tables. arXiv admin note: text overlap with
arXiv:2006.1145